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1 General matter on the dissertation 



Introduction to the topic 

Theory of formal languages plays an important role in contemporary math- 
ematics. It is closely connected to such fundamental disciplines as algebra, 
logic, and combinatorics. Also, theory of formal languages cannot be sep- 
arated from automata theory which studies acceptors and transducers of 
languages. Many algorithmic problems are formulated or can be easily re- 
formulated as problems about formal languages. Formal languages have a 
number of applications in computer science (programming languages and 
compilers, software and hardware verification, data compression, cryptogra- 
phy, computer graphics, etc), and also in linguistics (natural languages pro- 
cessing, computer analysis of semantics, machine translation, dictionaries) 
and biology (analysis of DNA sequences, structure of proteins, populational 
dynamics, neural nets, membrane computation). 

Formal languages are studied from different points of view. We point 
out five approaches; the researches on formal languages often contain ele- 
ments of different approaches. Within the algebraic approach, operations on 
languages, equations in words and languages, morphisms, congruences, and 
identities are studied. Also, there are some specific "algebraic" languages, 
e. g., the language of minimal terms of an arbitrary fixed algebra. By means 
of the logical approach, formal languages are just formula sets!! of different 
logics (usually, of the FO or SO logic with some restrictions and/or exten- 
sions). So, the main task in the logical approach is to capture the properties 
of languages with logical formalism. Another approach is to study the lan- 
guages by means of generating systems (such as grammars) and accepting 
or transducing machines. Within the structural approach, the properties of 
words are analyzed. Thus, a language is considered as the set of words 
defined by a common structural property. Finally, decidability and compu- 
tational complexity of algorithmic problems about words and languages are 
studied within the algorithmic approach. Note that combinatorial methods 
are widely used in all approaches. 

Five mentioned approaches have a key common point. All of them use 
a quantitative characteristic of a formal language called combinatorial com- 



plexity^]. The combinatorial complexity of a language L is the most natural 
counting function associated with L. This function returns the number of 
words in L of length n and is denoted by Cl(ti). 

1 Different approaches use different terminology. The terminology we adopt here is 
consistent and hardly can be misunderstood. 
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• In the study of algebras, it is often useful to estimate the growth of an 
algebra, i.e., the combinatorial complexity of the language of minimal 
terms. Growth problems were studied for groups, semigroups, rings, 
modules, and some other types of algebras. A (far from complete) list 
of papers on this topic includes [Till^aTll^lQoWn iTMITMITmiWl 
I17U] . The most remarkable result on the growth of groups is Gromov's 
theorem [100J stating that a finitely generated group has polynomial 
growth if and only if it is nilpotent-by-finite. Concerning the growth of 
noncommutative algebras, we should mention the book by Krause and 
Lenagan j!22J . 

• An important characteristic of a logical formula is the number of non- 
isomorphic finite models of given fixed type and given size. If the 
models are words, then one gets the combinatorial complexity of the 
language defined by the formula. This characteristic is often calculated 
for other types of models, e.g., for graphs, see the book [ 95] . We give 
just two examples. Fagin [78] established that any set of graphs defined 
by a FO formula has either density 1 or density in the set of all graphs. 
This paper started an extensive study of 0-1 laws on graphs. Second, 
we mention the investigations of the growth of hereditary (closed un- 
der generated subgraphs) classes of graphs. Such classes are direct 
analogues of factorial languages, which are quite common objects in 
the studies of combinatorial complexity. E.g., it is known [T6 | [T7 t[T58] 
that only six types of growth (constant, polynomial, exponential, and 
three factorial ones) are possible for hereditary classes of graphs^. 

• Grammars are also closely connected to combinatorial complexity. 
Chomsky and Schutzenberger established [H] that if a language is 
regular (i. e. is generated by a right-linear grammar), then its combina- 
torial complexity satisfies some linear homogeneous recurrence relation 
with constant coefficients and thus has rational generating function. 
Further, they proved that generating function of the combinatorial 
complexity of any unambiguous context-free language is algebraic. The 
latter result was later completed by Flajolet |80j who showed that such 
a generating function for an ambiguous context-free language can be 
transcedentaH. 

2 Logical approach to languages also generated another important quantitative char- 
acteristic. Descriptive complexity of a language equals the size of the minimal model of 
a given type, generating the language. This characteristic is inspired by the notion of 
Kolmogorov complexity, see the book |127) . 

A remarkable result in the converse direction was proved in 2010 within the logical 
approach: any function / : No — > No satisfying a linear homogeneous recurrence relation 
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• Within the structural approach, the subword complexity functions are 
studied for infinite words. Subword complexity is just the combi- 
natorial complexity of the set of finite factors of an infinite worcjf|. 
The first results on subword complexity (and in fact, on combinato- 
rial complexity at all) were obtained by Morse and Hedlund in 1938- 
1940 |137[I138] . A systematic study of subword complexity was ini- 
tiated by Ehrenfeucht and G. Rozenberg, see pjSTfTB"] . We point out 
a nice classification of morphisms w. r.t. subword complexity of their 
fixed points, given by Pansiot |147| . In addition, there are several 
other counting functions on infinite words, such as palindromic, arith- 
metic, pattern, maximal pattern, and permutational complexities, see 
[7100lg31 [TU91[TTUl[T3IJl[TM[TBg] . 

• Except for the simple fact that the cost of brute force search algo- 
rithms depends on the size of the searched language, the connection of 
algorithmic approach to combinatorial complexity is not so obvious. A 
nontrivial example of such a connection is given in the dissertation. 

During the last decades, a lot of papers on combinatorial complexity was 
published. A deep study of subword complexity of infinite words (in addition 
to the above references, see |S1IS1I2S1I3ZII3S1I1III5I1E31EI1IS21IS11EB]) resulted 
in satisfactory answers to the most natural questions. Besides the infinite 
words, most papers about combinatorial complexity concern just a single 
language each, see |3llMl3^IMl55llMl9l?l li!H[m . 
The other results, see [151 Hffl I32M HH1 1601 1861 fT65] . look rather scattered. 
There is a certain need in some unified theory that 

• explains connections between the structure and the growth properties 
of a language, 

• provides algorithms and formulas to find or approximate the parameters 
of growth of a language, 

• predicts the impact of the variations in the properties defining a lan- 
guage on its combinatorial complexity. 

We are going to make some steps towards the construction of such a theory. 
In order to do this, we developed the following program. 

with constant coefficients equals the difference of combinatorial complexities of two regular 
languages |121| . 

4 For infinite words, a topological approach is also quite useful. This approach includes 
the study of the function which is similar to subword complexity and is called an entropy 
of an infinite word. 
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Research program 



In what follows, "complexity" of a language always means combinatorial com- 
plexity. To study the complexity of a language L we should have an algorithm 
deciding whether w G L for any given word w. The existence of such an al- 
gorithm means exactly that L is recursive. So, all further considerations are 
within the class Rec of recursive languages. When we speak about a class 
of languages, we mean the intersection of this class with Rec. Classes of 
languages considered in the dissertation are presented in Fig. [TJ The main 
objects of study are marked in this figure by thick lines. 




Rec 


Recursive 


AFact 


Antifactorial 


CS 


Context-sensitive 


PF 


Power-free 


CF 


Context-free 


PtF 


Pattern-free 


Reg 


Regular 


APF 


Abelian power-free 


Pref 


Prefix-closed 


W 


Sets of factors of infinite words 


Fact 


Factorial 


MP 


Sets of minimal powers 


FAD 


With finite antidictionary 







Figure 1: Classes of languages considered in the dissertation. Main [resp., sec- 
ondary] objects of study are drawn by thick [resp., thin] lines. The two middle 
classes of Chomsky hierarchy are drawn by dashes to indicate that we do not study 
them in general. 

Studying a language, we are interesting in the asymptotic behaviour of 
complexity rather than the precise values of it. In particular, we consider 
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finite languages as a degenerate case. Thus, "degenerate" intersections of 
the classes of languages are not represented in Fig. [TJ The main parameter 
of the asymptotic behaviour of complexity is the growth rate of a language 
Gr(L) = lim ((^(n)) 1 /™. To compare functions, we use the standard O, J7, 

n— >oo 

and notation. 

• Regular languages constitute one of the most important classes of lan- 
guages and have a number of equivalent definitions (defined by reg- 
ular expressions, recognized by finite monoids, expressed in monadic 
SO logic, generated by right-linear grammars, recognized by finite au- 
tomata, and so on). The main theorem on the complexity of regular 
languages can be obtained putting together several results from the 
book by A. Salomaa and Soittola |157| . Slightly simplifying, we can 
state this theorem as follows. For each regular language L there is a 
number r G N, and for each j = 0, . . . , r— 1 there exist a real polyno- 
mial Pj(n) and algebraic real numbers ctj, 7j such that ctj = jj = or 
^ jj < ctj, and 

Cr(ri) = p n >(n)a™, + 0(7™/), where n' — n mod r. (1) 

The only but significant disadvantage of the above description is the 
lack of connection between the description and the properties of the 
language L (or the parameters of the construction which defines L). As 
a result, no efficient algorithms to calculate the asymptotic parameters 
of Cl(ti) were known except for the folklore algorithm to calculate Gr(L) 
(i.e., the maximum of the numbers ctj). This algorithm is polynomial 
but not enough efficient for practical calculation. Thus, considering 
deterministic finite automata (dfa's) as the most convenient and natural 
way to represent regular languages, it is natural to state the following 
problems. 

Regl: for dfa's, describe the properties that are responsible for the pa- 
rameters of asymptotic behaviour of the complexity of correspond- 
ing regular languages; 

Reg2: describe possible oscillations of complexity for regular languages; 

Reg3: find an efficient algorithm to calculate, up to the O-class, the 
complexity of a language from a dfa recognizing this language. 

• Factorial languages are the languages closed under taking factors of 
words. The class of factorial languages is wide. In particular, it contains 
the languages of minimal terms of algebras, the languages of factors 
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of infinite words, and the languages defined by avoidance properties of 
words. The antidictionary of a factorial language L consists of all words 
that are minimal w. r. t. the containment order among the words from 
the complement of Clearly, L is determined by its antidictionary. 

In [T5|I71|. factorial languages of bounded complexity are studied. No 
general results on the complexity of factorial languages are known. To 
study the complexity of factorial languages, it is convenient to use 
method of regular approximations, described in the second part of this 
paper. This method uses the regular languages with the same local 
structure of words as in the target language. The following problems 
arise. 

Factl: study the convergence of the method of regular approximations 
and the restrictions on the use of this method; 

Fact2: find a nontrivial example of factorial language such that the exact 
growth rates can be found for all regular approximations of this 
language; 

Fact3: find language transformations preserving growth rates of factorial 
languages. 

• Languages with finite antidictionary (FAD-languages) are contained in 
the intersection of the two previous classes. These are exactly the 
languages that serve as regular approximations of factorial languages. 
In most cases FAD-languages are given by their antidictionaries. We 
mention the Goulden-Jackson cluster method [9T |I140] to build the 
generating function for the complexity of any FAD-language from its 
antidictionary. But this method is too time-consuming to process big 
antidictionaries, and hence, to obtain sharp bounds for the complex- 
ity of factorial languages through their regular approximations. The 
following problems arise naturally. 

FAD1: find all, up to the 0-class, possible complexities of FAD-languages; 

FAD2: find, which transformations of antidictionaries preserve the asymp- 
totic parameters of complexity of FAD-languages; 

FAD3: characterize the dfa's recognizing FAD-languages. 

• Power-free languages constitute a well-known class of factorial lan- 
guages. They were extensively studied since the seminal papers by 

5 The complement of L is an ideal of the free monoid over the alphabet of L. and the 
antidictionary is the minimal generating set of this ideal. 
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Thue |162pi63] . Let w be a word of length n, and let > 1. The 

0-power of w is the word 

w 13 = pj „ . W , ~u/ of length , where u>' is a prefix of w. 

L/3J times 

A word is [3- free [f3 + -free] if it contains no /3-powers [resp., no 0'- 
powers satisfying (3' > (3]. The language L(k,(3) [L(fc,/3 + )] consists 
of all /3-free [resp., /3 + -free] words over the /c-letter alphabet^]. For a 
fixed alphabet, the size of a power- free language grows as increases. 
Hence, there exists repetition threshold RT(k) separating finite and in- 
finite A;-ary power-free languages. The values of RT(A;) were conjec- 
tured by Dejean in 1972 [EE]. Namely, RT(3) = f, RT(4) = |, and 
RT(k) = t^t otherwise. The proof of Dejean's conjecture was finished 
in 2009, see |Mll56ll571[T86lli;-i9yi4fjyi5()| . The known results on complex- 
ity of power-free languages are related to a few particular languages, see 
the survey [21j . More than ten papers were devoted to the growth rate 
of the language L(3, 2): the best upper bound was obtained by Ochem 
and Reix |144j . and the best lower bound was given by Kolpakov [119J. 
The most interesting feature found so far is the "polynomial plateau" 
of complexity, discovered by Karhumaki and Shallit in the bi- 

nary case: all power-free languages between L(2, 2 + ) and L(2, 7/3) have 
polynomial complexity (and, moreover, quite close orders of polynomial 
growth). The complexity of L(2, 2 + ) was estimated with increasing pre- 
cision in [36. 1171, [126[I153| : the final result was obtained in |108| . The 
following problems on power-free languages should be considered. 

PF1: find a property (of powers) that can explain the existence of the 
polynomial plateau; 

PF2: prove a connection between low combinatorial and low computa- 
tional complexity, solving the context equivalence problen^ for the 
language L(2, 2 + ) from the polynomial plateau; 

PF3: build universal algorithms to estimate the growth rates of power- 
free languages both from above and from below; 

PF4: describe the growth rate of the languages L(k,(3) as a function of 

k and (3; 

6 It is convenient to consider /3 + as a «number» such that the inequalities x ^ (3 and 
x < j3 + are equivalent. Once the set of powers is extended in this way, we use only the 
notation L(fc, (3). 

7 This problem, which is a version of the word problem, will be introduced in the second 
part of this paper. 
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PF5: describe structural properties of the minimal infinite power-free 
languages over different alphabets (threshold languages). 

• Languages of minimal powers are exactly the antidictionaries of 
power-free languages. Any such language is antifactorial, i. e. consti- 
tutes an antichain w. r.t. the containment order. Antifactorial lan- 
guages are closely connected to the factorial ones, but their complexity 
behaves irregularly and is completely unexplored. Here we state only 
one problem; it significantly generalizes Problem 1.12 of [8J. 

MP: for any language of minimal powers, describe the set of zeroes of 
its complexity. 

Aim of dissertation 

The dissertation is aimed at the development of new approaches to study 
combinatorial complexity. We apply these approaches to different classes of 
formal languages 

• to discover and estimate the impact of the properties of languages (and 
of the structures associated with languages) on complexity; 

• to provide algorithms and formulas estimating the complexity for wide 
classes of languages; 

• to discover connections between combinatorial and computational com- 
plexity. 

Particular goals of the dissertation are solutions to the fifteen problems men- 
tioned in the research program. 

Methods 

The methods used in the proofs of the obtained results can be grouped as 
follows. 

• Methods of combinatorics of words, based on the properties of periodic 
words, the properties of Thue- Morse words and Thue- Morse morphism, 
construction and analysis of morphisms, encodings, circular and two- 
dimensional words. 

• Methods of automata theory, in particular, original construction of web- 
like and generalized web-like automata. This construction allows us to 
prove several theorems which have nothing in common at first glance. 
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• Methods of matrix theory, based on the Perron-Frobenius theorem and 
related properties of nonnegative matrices. We also use the Jordan 
normal form, the Hamilton- Cayley theorem, and the calculation of de- 
terminants of variable size. 

• Methods of graph theory, including equitable partitions, analysis of 
strongly connected components, and some spectral properties of graphs. 

We also make use of some classical combinatorial algorithms such as 
Tarjan's algorithm for finding strong components of a digraph and Aho- 
Corasick's algorithm for pattern matching. Finally, we use computer to cal- 
culate numerical bounds of complexity and also to search examples and make 
routine computations in some proofs. 

Size and structure of dissertation. Publications 

The dissertation (287 pages) consists of introduction (Sect. 1°— 3°), four chap- 
ters (§§ 1-20), bibliography, and index. The results constituting the disserta- 
tion are published in the papers |176H200] . In addition, the manuscripts |201f - 
204J are submitted or will be submitted soon. The papers |186[I193[I196[I197| 
are the extended versions of |183[ll87tll91[ll95] . respectively. 
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2 Results 

We solved, completely or partially, all fifteen problems mentioned in the 
research program. The results are highly connected with each other: more 
than twenty statements are used outside the sections when they were proved. 
These connections witness the possibility to build a unified theory of combi- 
natorial complexity. Now we start to describe the results. 
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Chapter 1 (§§1—4). Regular languages 

According to the formula ([[]) given above, the asymptotics of complexity is 
fully described by the set of r asymptotic functions pj(n)a™ (or, up to the 
0-equivalence, n mj o;"). The parameter <Xj of the fastest growing asymptotic 
function equals Gr(L), while the parameter nij = Pd(L) is the so-called 
polynomial index of Z|j. 

We consider finite automata as labeled digraphs. A dfa is consistent if 
for any of its states (vertices) there is an accepting path containing this 
state. The proofs of most statements in Chapter 1 result from the structure 
and mutual location of strong components (i. e., maximal strongly connected 
subgraphs) of consistent dfa's. First we give the polynomiality criterion. 

Theorem 1 |181| Suppose that a language L is recognized by a consistent 
dfa A. Then 

(1) if A is acyclic, then L is finite; 

(2) if A contains two cycles sharing a common vertex, then L has exponential 
complexity; 

(3) if A contains cycles but any two of them are disjoint, then L has poly- 
nomial complexity and Pd(L) = m—1, where m is the maximum number of 
cycles intersected by a single walk in A. 

Corollary 1 |181] If a regular language L over k letters is recognized by a 
consistent dfa with N vertices, then it is decidable in O(Nk) time whether the 
complexity of L is polynomial or exponential. In the first case, the polynomial 
index of L can be found in time O(Nk) also. 

Next, consider the problem of finding the growth rate Gr(L). Recall that 
the index Ind(G) of a graph G is the Frobenius rootj^] of the adjacency matrix 
of G. A folklore result says that Gr(L) = lnd(*4.) for any consistent dfa A 
recognizing L. In general, the Frobenius root of an adjacency matrix cannot 
be found exactly but can be approximated with the absolute error S for any 
5 > 0. A straightforward computation uses characteristic polynomial of the 
matrix and requires Q(N A ) operations and Q(N 3 ) additional space. The 
following theorem radically improve this situation. 

Theorem 2 |185[I196] Suppose that a language L over k letters is recog- 
nized by a consistent dfa A with N vertices. There is an algorithm which, 

8 Polynomial index can be defined by the inequality < limsup , , — ; — — < oo. Be- 

n^oo n pd WGr(L)" 

sides the class of regular languages, we consider polynomial indices only for the languages 
of polynomial complexity. 

9 The Frobenius root, i.e., the maximal in absolute value eigenvalue of a non- negative 
matrix, is one of the most important spectral characteristic of a graph. 
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given A and a number 8, < 8 < 1, calculates Gr(L) with the absolute error 
at most 5 in time Q (log(l/ 8) -Nk) using 0(log(l/5)-iV) additional space. 

The mentioned algorithm (Algorithm R, |196j ) plays an important role 
in the dissertation. Note that it can be used to calculate the index of any 
graph, as the following theorem shows. 

Theorem 3 |196| Let G be a digraph with n vertices and m edges. There 
is an algorithm which, given G and a number 8, < 8 < 1, calculates Ind(G) 
with the absolute error at most 8 in time 0(log(l/<5)-m) using B(log(l/5)-n) 
additional space. 

Next we estimate the number of asymptotic functions, using the following 
technical notion. A strong component C of a consistent dfa A is important if 
there is infinitely many numbers n« such that (a) there is an accepting walk 
of length Hi intersecting C and (b) there is no accepting walk of length rij 
intersecting a strong component with the index greater than Ind(C). Recall 
that imprimitivity number of a digraph is the greatest common divisor of the 
lengths of all its cycles. To obtain the following theorem we give a direct 
proof of the formula (1) by means of matrix theory. 

Theorem 4 |194| Suppose that a language L is recognized by a consistent 
dfa A, r is the least common multiple of the imprimitivity numbers of all 
important strong components of A. Then the complexity of L can be described 
by r asymptotic functions. 

The following theorem describes the polynomial index of a regular lan- 
guage in the general case. 

Theorem 5 |185[ll94j Suppose that a language L is recognized by a consis- 
tent dfa A, m is the maximum number of strong components of index Gr(L) 
intersected by a single walk in A. Then Pd(L) = m—1. 

To calculate polynomial index, we need to prove or disprove the equality 
of indices of two digraphs in the case when these indices are equal up to the 
approximation error. 

Proposition 1 |194] Let A^jv be the set of all consistent dfa's having 
at most N vertices and acting over the k-letter alphabet. If A, 13 G Aj^jv, 
then the equality of the numbers lnd(*4) and lnd(£>) can be verified in time 
0(N 4 + log(l/8(N))-N 2 ) , where 8(N) is the minimum nonzero difference of 
indices of two dfa's from Ak,N- 

The proof of Theorem H] provides a way to get the parameter r and allows 
one to reduce the calculation of the numbers <%,-, rrij for j — 0, . . . , r— 1 to the 
calculation of the growth rates and polynomial indices of some subgraphs of 
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the dfa A. Thus, Problems Regl and Reg3 are completely solved. 

The function / is called oscillating, if the ratio (/(n+l)//(n)) has 
no limit as n — > oo. If, moreover, limsup n _ >00 (/(n+l)//(n)) = oo or 
liminf n _ > . 00 (/(n+l)//(ri)) = 0, then / is said to be wild. The oscillations 
of complexity for arbitrary, prefix-closed, and factorial regular languages 
(Problem Reg2) are described in 

Theorem 6 |185[I194] All possible types of combinatorial complexity for 
arbitrary, prefix- closed, and factorial regular languages w. r. t. oscillation 
property are listed in the following table (where W=wild, = oscillating, 
N = non-oscillating, a = Gr(L), m = Pd(L)): 



Regular languages 


a=l, m=0 


a=l, m>0 


a>l, m=0 


a>l, m>0 


Arbitrary 


W,0,N 


W,0,N 


W,0,N 


W,0,N 


Prefix- closed 


0,N 


0,N 


0,N 


0,N 


Factorial 


0,N 


N 


0,N 


0,N 



We finish the survey of Chapter 1 with the following property, which is 
distinctive for regular languages. 

Proposition 2 |185[I194] An arbitrary regular language L has the same 
growth rate and polynomial index as its closures under taking prefixes, suf- 
fixes, and factors. Moreover, since such closures are not wild languages, each 
of them has the complexity B(n Pd( - i - ) Gr(L) n ). 

Chapter 2 (§§ 5—11). Factorial languages. FAD-languages 

In this chapter we study general problems about factorial languages together 
with the problems about FAD-languages. First we describe the method of 
regular approximations (§5). Each factorial language L over some alphabet 
E has an antifactorial antidictionary M = (£*— L) U LS U EL. We choose 
an arbitrary sequence {Mj} of finite subsets of M such that 

oo 

Mi C M 2 C . . . C Mi C . . . C M, M 

i=i 

(for instance, Mj = MnS^ ! ). The FAD-languages Lj with the antidictionar- 
ies Mj are regular approximations of L. We have 

oo 

L C ... C U C ... C Li, f]Li = L. 

i=i 

One can check that lim^oo Gr(Lj) = Gr(L). By Theorem [21 there is an algo- 
rithm that successively calculates, for any factorial language L, the members 
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of a decreasing sequence that converges to Gr(L). The convergence rate of 
such a sequence for some classes of factorial languages is really high, see 
below the results of Chapter 3. 

To give a more detailed analisys of Problem Factl, we pay attention to 
the following questions. Let L be an arbitrary factorial language. First, 
if L has polynomial complexity, can the polynomial index of L be found or 
approximated by means of regular approximations? Second, can one estimate 
the complexity of L using some approximations of L by regular languages 
from below? In an important particular case, the following proposition gives 
negative answers to both questions. 

Proposition 3 [182J If all words in an infinite factorial language L are (3- 
freefor some number (3, then all regular approximations of L have exponential 
complexity and all regular subsets of L are finite. 

Then we define and analyze FAD-automata, which are "canonical" dfa's 
recognizing FAD-languages. FAD-automata are constructed from antidic- 
tionaries in linear time by a version of textbook Aho-Corasick's algorithm 
for pattern matching, see [52J. 

After this, we solve Problem Fact2 (§6). As a target language, we take 
the Thue-Morse language TM. It consists of all factors of the Thue-Morse 
word, which is the fixed point of the binary morphism defined by the rule 
6(a) = ab, 6(b) = ba. The following proposition describes the antidictionary 
of the Thue-Morse language. 

Proposition 4 [18 lj The antidictionary of the language TM is the set 

M = {aaa,bbb} U {c6\aba)a } d6 l (bab)b } c6 l (bab)a } d6\aba)b\ i ^ 0}, where 
c,d are the last letters of 6 l (a) and 6 l (b), respectively. 

The antidictionary M contains words of length 3 and of length 3-2*+2 for 
any i > 0. Let M» = M n {a,b}^ i2l+2 and additionally M_ x = {aaa,bbb}. 
The growth rates of the corresponding regular approximations are given by 
the following formula ((f) denotes the golden ratio). 

Theorem 7 |181| Let Li be the FAD-language with the antidictionary Mi. 
Then Gr(Li) = x / 2<+1 . 

In § 7, we build two two-parameter series of FAD-automata: web-like and 
generalized web-like automata. They are used to prove Theorems [8]-10. We 
call a language L G X* symmetric if it is closed under all automorphisms of 
the free monoid E*. Problem FAD2 for the case of polynomial complexity is 
solved by 
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Theorem 8 |188| For any non-unary alphabet S and any integer m ^ 0, 
there exist both symmetric and non-symmetric FAD-languages over £ having 
the complexity Q(n m ). 

The following quite surprising theorem is proved in §8. It shows that 
regular approximations of polynomial complexity cannot be used to find the 
polynomial index of the approximated language. 

Theorem 9 |184j For any non-unary alphabet £ and any integers s and 
m such that 1 ^ s ^ m, there exists a factorial language over £ having the 
complexity Q(n s ) and such that almost all members of any sequence of its 
regular approximations have the complexity 0(n m ). 

Thus, the sequences of regular approximations of polynomial complexity 
have a "non-compactness" property: polynomial indices of approximations 
can stabilize arbitrarily far from the polynomial index of the target language. 
The only exception concerns the languages of bounded complexity. 

Proposition 5 |184] Almost all regular approximations of any factorial 
language of complexity 0(1) have the complexity 0(1). 

In §9, we use the FAD-languages recognized by web-like and generalized 
web-like automata to approximate factorial languages from below. Thus, we 
answered the second question about the regular approximations in the affir- 
mative, concluding the study of Problem Factl. Namely, we proved The- 
orem 10 |189| stating that some languages defined by natural conditions 
have intermediate (i.e., more than polynomial, but less than exponential) 
complexity. Factorial languages given by simple properties and having inter- 
mediate complexity are quite rare. So, the languages we have found are of 
certain interest. Let us describe one of two infinite series of such languages. 

The representation w = a™ 1 a™ 2 . . . a™* , where a« 7^ a^+i for all i, is the 
power factorization of a word w. The mentioned series consists of the lan- 
guages of all words (over some fixed alphabet) satisfying the following two 
conditions on their power factorization: 

- the letters follow each other in accordance with some cyclic order; 

- mi ^ TU2 ^ . . . ^ m t -\. 

In § 10, the transformations requested by Problem Fact3 are studied. 
Namely, we consider the restriction of the language to its extendable (in one 
or both directions) part. The word w G L is two-sided extendable in L if 
there are arbitrarily long words u and v such that uwv G L. The one-sided 
(say, right) extendability is defined in a similar way. The corresponding sets 
of extendable words are denoted by e(L) and re(L), respectively. 
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Theorem 11 |183[ll86j Gr(L) = Gr(re(L)) = Gr(e(L)) for any factorial 
language L. 

Extendable parts of a language usually have simpler structure than the 
language itself. Hence, Theorem [11] can be useful for estimating the growth 
rates of factorial languages (e. g., we apply Theorem ITTlto threshold languages 
in § 16). On the other hand, more "subtle" parameters of complexity cannot 
be found from the extendable parts of a language, as the following theorem 
shows. 

Theorem 12 [183,186] Each of the ratios Ci(n)/C re (i)(n), C L (n)/C e ( L \(n) 
can be a bounded, polynomial, or intermediate function. 

§11, which is the last and biggest section of Chapter 2, is devoted to 
the FAD-languages of exponential complexity and their FAD-automata. The 
growth rate of a regular language L (and, in many cases, other asymptotic pa- 
rameters of complexity) can be found from a C-graph, which is the subgraph 
generated by all nontrivial strong components of a consistent dfa recognizing 
L. So, C-graphs are the main objects of study in this section. Some of the 
results of this section are published in |192j while the others are contained 
in [202]. 

We introduce two transformations of an antidictionary (reduction and 
cleaning). These transformations reduce the size of an antidictionary but 
preserve the 0-class of complexity of the factorial language with this antidic- 
tionary. This is exactly what is requested in Problem FAD2. Further, we build 
Algorithm C that decides whether a given arbitrary nontrivial strongly con- 
nected digraph is a strong component of some FAD-automaton and builds the 
corresponding FAD-language in the case of the affirmative answer. Thus, we 
get an algorithmic description of FAD-automata in terms of forbidden strong 
components (this result partially solves Problem FAD3). Using this descrip- 
tion, we enumerate all possible C-graphs (and hence, all possible growth 
rates) for the case of binary alphabet and "small" FAD-automata. This is 
our Theorem 13 [192J, solving Problem FAD1 for a particular case. 

For bigger classes of FAD-languages the enumeration of growth rates is 
hardly possible, so a deciding algorithm would be the best solution we can 
hope for Problem FAD1. On the base of Algorithm C we construct Algo- 
rithm G, which builds the FAD-automaton having the same growth rate 
as the input strongly connected digraph and containing this digraph as a 
strong component. This algorithm allows one to construct a FAD-language 
with a given growth rate, thus providing a partial algorithmic solution to 
Problem FAD1. For a pity, we cannot use Algorithm G to prove that a given 
algebraic number is NOT a growth rate of a FAD-language. Namely, the 
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following proposition suggests the idea that it is not possible to pick up a 
finite family of digraphs with the given index a such that the "unsuccessful" 
run of Algorithm G on all instances from this family proves that a is not a 
growth rate of a FAD-language. 

Proposition 6 An algebraic number a can simultaneously (a) be the 
growth rate of a FAD-language over an alphabet £ and (b) be the index of NO 
r -vertex digraph which is a strong component of a consistent dfa recognizing 
a FAD-language over E, where r is the degree of a. 

In the end of § 11, we study the mutual location of strong components in 
FAD-automata. Recall that such a location determines the polynomial index 
of a regular language (Theorem |5]). The following propositions, proved by 
examples, show that from the complexity point of view FAD-languages form 
a quite representative subclass of Reg. 

Proposition 7 (1) There exist both symmetric and asymmetric binary 
FAD-languages having the complexity of type Q(na n ) for some a > 1. 
(2) For any k ^ 3, there exist k-ary FAD-languages having the complexity of 
type Q(n k ~ 2 a n ) for some a > 1. 

Proposition 8 (1) Over the binary alphabet, there exists a FAD-automa- 
ton whose C-graph is not weakly connected. 

(2) Over the binary alphabet, there exists a FAD-automaton whose nontrivial 
strong components form the M 2 poset w. r. t. reachability. 

Chapter 3 (§§12 18). Power-free languages 

This chapter is devoted to power-free languages except for § 18, in which we 
discuss the extension of our methods to pattern-free and Abelian power-free 
languages. 

In § 12, we solve Problem PF1. Exponent of a word is the ratio between 
its length and its shortest period^]. We call an exponent k-stable, if there 
exists a fc-ary word which has the exponent (3 and is extendable to a double- 
infinite /3 + -free word. The connection between fc-stability and complexity 
is illustrated by the following note: if the exponent is not ^-stable, then 
e(l(k,/3 + )) = e(L(k,p)) and hence Gr(Jfe,/9+) = Gr(k,0) by Theorem [TTJ 
The following theorem shows that non-2-stable exponents clearly mark out 
the polynomial plateau (the exponents (3 < 2 correspond to finite binary 
languages). Thus, Problem PF1 is solved. 



10 The word of length n over the alphabet £ can be seen as a function w : {1, . . . , n} — > S. 
Periods of w are the periods of this function. 
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Theorem 14 [180J The exponent (5 is 2-stable if and only if f3 = 2 or 
& > 7/3. 

Corollary 2 For any (3 G [2 + ,7/3], the language L(2,/3) has subexponen- 
tial complexity. 

The last statement was first proved by Karhumaki and Shallit (they 
even showed that all these languages have polynomial complexity). In addi- 
tion, it was proved in [111] that the complexity of the language L(2, (7/3) + ) 
is exponential, that is, the polynomial plateau ends with the exponent 7/3. 
But the latter result immediately follows from the fact that the language 
L(3, 2) has exponential complexity [25,27j and the following theorem. 

Theorem 15 [180J There exists a morphism f : {1,2,3}* — > {a, b}* , 
mapping any square-free word to a (7/3) + -/ree word. 

The solution to Problem PF2 is given in § 13. A context of a word u in a 
language L is a pair (u>i,u>2) of words such that W1UW2 G L. By definition, 
the words u,v G L are context equivalent if the sets of their contexts coincide. 
The corresponding decision problem is called the context equivalence prob- 
lem (for the language L)0- This problem is little-studied and seems to be 
hard except for regular languages and the factorial languages satisfying the 
bounded gap propertjQ. The solution to the context equivalence problem for 
the language L(2, 2 + ) of binary overlap-free words is technically involved and 
reveals a non-trivial structure of this language. Nevertheless, the resulting 
Algorithm E is very fast. 

Theorem 16 [20 lj The context equivalence of two arbitrary binary over- 
lap-free words can be verified in the time linear in their total length. 

Corollary 3 [20 lj The word problem in the syntactic monoid of the lan- 
guage L(2, 2 + ) can be solved in the time linear in the total length of input 
words. 

The proof of Theorem [16] is quite long. The main steps are 

- a linear-time algorithm to check one-sided and two-sided extendability 
of an overlap-free word (Theorem 17 |179| ): 

- Proposition 9 [179J, stating non-equivalence of any nonequal two- 
sided extendable words; 

11 This problem is quite close to the word problem in the syntactic monoid of the lan- 
guage L. 

12 A language L satisfies the bounded gap property if there is a function f(n) such that 
any word from L of length f(n) contains all words from L of length n as factors. See |115j 
for the solution to the context equivalence problem for such languages. 
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- a necessary and sufficient condition of equivalence of one-sided extend- 
able words (TeopeMa 18 |179| ): 



- the reduction of equivalence checking for nonextendable words to the 
comparison of finite sets of one-sided contexts [20 lj ; 

- an algorithm that compares the sets of one-sided contexts of nonex- 
tendable words in linear time |201| . 

The key role in the whole proof is played by the Thue-Morse morphism. 

The studied problem has such a low time complexity mainly because of 
the extremely small set of binary overlap-free morphisms. Namely, the semi- 
group of all morphisms preserving the language L(2, 2 + ) is generated by the 
Thue-Morse morphism and the involution automorphism, see [2"3" |I16U| . The 
same property holds for all binary power-free languages from the polynomial 
plateau [15 lj . Hence, the above solution to the context equivalence problem 
can be applied, after a small correction, to any language from the polynomial 
plateau. 

§§ 14-15 are devoted to the development of the algorithms estimating 
complexity of power- free languages (Problem PF3). An algorithm using reg- 
ular approximations to obtain upper bounds for the growth rate of factorial 
languages should consist of three big steps: 

1) calculating the antidictionary of the chosen regular approximation; 

2) building a consistent dfa from the antidictionary; 

3) calculating the growth rate of the regular approximation from the dfa. 

An efficient impementation of steps 2 and 3 is provided by the mentioned 
above version of Aho-Corasick's algorithm and by Algorithm R, respectively. 
If we perform step 1 by some optimized exhaustive search, we will obtain an 
algorithm [185J which allows one to get much better upper bounds than 
the algorithms described in |140[I144) . The main flaw of this straightforward 
algorithm follows from the type of dependence of the time and space expences 
on the alphabet: both these expences include the factor k\, where k is the 
alphabetic size. So, it is hardly possible to proceed languages over more than 
4-5 letters. 

The main result of § 14 is Algorithm U |196| for upper bounds. The time 
and space required by Algorithm U to process a language are approximately 
k\ times less than the time and space used by the straightforward algorithm. 
Such a gain is obtained by using symmetry of power-free languages. Instead 
of a FAD-automaton, the algorithm directly builds a "factor automaton" 
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which has the same index but much less size. The practical efficiency of 
Algorithm U is demonstrated, for example, by Tabled], while the theoretical 
one is described by the following 

Theorem 19 [196J Suppose that M is the antidictionary of the language 
L(k,/3), M m is its subset consisting of all words of period at most m, and 
N = (m/3Cukp\(rn,))/k\. Then the factor automaton whose index is equal to 
the growth rate of the regular approximation L m (k,(3) has O(N) vertices and 
can be constructed from the triple (m,(3,k) in time 0(N log N) and space 
O(N). 

To build the antidictionary in an optimized way, we use 

Theorem 20 [202] Let 1 < (3 < 2, xy G L(k,(3). If the (3-power (xy) p is 
not minimal, then the word (xy) 13 contains a (3-power (zt) 13 such that |(^t) /3 | < 
\xy\. Moreover, if (3 ^ (4/3) + , then \zt\ ^ \y\, and if (3 ^ (5/4) + , then 
\zt\ < \y\. 

Lower bounds for the growth rates of power-free languages cannot be 
obtained similarly to the upper bounds, see Proposition [3l But in the case 
^ 2, it is possible to use the properties of factor automata to convert the 
upper bounds obtained from them to the two-sided bounds. 

Theorem 21 [190J Suppose that (3 ^ 2, k and m are positive integers, 
M m is the set of all words of period ^ m from the antidictionary of the 
language L(k,(3), L m is the regular approximation ofL(k,(3) with the antidic- 
tionary M m , and the FAD-automaton recognizing L m has a unique nonsingle- 
ton strong component. Then any number^ such that 7+ TM -// 7 _n ^ Gr(L m ) 
satisfies the inequality 7 < Gr(L(/c, /?)). 

The idea of such a conversion of upper bounds into the two-sided ones was 
suggested by Kolpakov. In |118[I119] he obtained good enough lower bounds 
for the growth rates of L(3,2) and L(2,3). But the method of Kolpakov is 
not universal (one should derivate approximating formulas for each language 
separately) and uses quite time-consuming procedures. Our method is free 
from these flaws. It is absolutely universal, because k and (3 are not used in 
the calculation of 7; only the numbers m and Gr(L m ), provided by Algorithm 
U, are needed. In addition, 7 can be calculated with any precision in an 
almost constant time. In the particular cases considered by Kolpakov our 
bounds are much more precise. 

The computer implementation of Algorithm U (together with the attach- 

13 Algorithm R performs splitting of the processed dfa into strong components. So, the 
latter condition is already checked during the calculation of the growth rate of L m . It 
seems probable that this condition is always satisfied, but it is not proved yet. 
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ment for calculating lower bounds) allowed us to considerably improve all 
known bounds of the growth rates for power-free languages and to obtain 
lots of previously unknown bounds. Selected results are given in Table 
All of them are obtained using a PC with a 3.0GHz CPU and 2Gb of mem- 
ory. All bounds are rounded off to 7 digits after the dot. If only one bound 
is given, then these digits are the same for both lower and upper bounds. 



Table 1: Bounds for the growth rates of (3 -free languages with (3^2. 
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Small alphabets 



Large alphabets 



Among all power-free languages, threshold languages are the most in- 
teresting from the complexity point of view. From the obtained numerical 
results, it clearly follows that the sequences of growth rates of regular approx- 
imations for threshold languages demonstrate the slowest convergence among 
all such sequences for power-free languages. Threshold languages are consid- 
ered in § 16. To study them, we introduce the notion of m-repetition, which is 
any word u 13 belonging to the antidictionary of the considered language and 
satisfying the condition \vP\ — \u\ = m. For a threshold language L(k,j3), we 
denote by \S m \k) its regular approximation, whose antidictionary consists 
of all r-repetitions with r ^ m. It is not hard to see that the growth rates 
of all languages L^ 2 \k) coincide. Close similarity of the structure of these 
languages can be observed using cylindric representation of words |187[ [193j. 
Such a similarity also takes place for more precise regular approximations, 
as the following theorem shows. 

Theorem 22 [193J For any fixed integer m ^ 3, there exists a finite set 
D m of ternary two-dimensional words of size 0(m) x 0(m) such that for any 



14 Tables with numerical bounds for the growth rates of different power-free languages 
can be also found in H90lH93lfT96l . 
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k > 2m— 3 a word belongs to the language LS m >(k) if and only if its cylindric 
representation has no factors from the set D m . 

We calculate the sets D m for m = 3,4,5,6 ( |187[I193] ; the set D-j, cal- 
culated by Gorbunova, also can be found in |193| ). Using these sets, we 
calculate the growth rates of the languages \S m \k) for different k. Analyzing 
both these results and the results obtained by the direct use of Algorithm U 
we formulate the following conjecture. 

Conjecture 1 |187] ; revised in [193J The sequence of growth rates ofk-ary 
threshold languages converges to a limit a ~ 1.242 as k approaches infinity. 

Conjecture [T] naturally follows from the above description of threshold 
languages (which is our solution to Problem PF5). It strengthens Dejean's 
conjecture 15 ! and refutes the idea that the growth rates of threshold languages 
tend to 1 as the alphabets increase. Currie and Rampersad |57) mention 
that when proving Dejean's conjecture they obtained some results supporting 
Conjecture [TJ 

Using Algorithm U and Theorem [21] we obtain numerical bounds for the 
growth rates of the languages L(k,/3) for a wide range of alphabets and 
exponents. As a result, we are able to represent the behaviour of the growth 
rate as a function a(k, (3). The empirical laws of behaviour of this function are 
presented and then explained in § 17. We derive several asymptotic formulas 
for a(k, (3), thus solving Problem PF4. For the case > 2 one has 

Theorem 23 |197| Let (3 £ [n + , n+1], where n ^ 2 is an integer. Then 

1 ' I k-^ + ^ + O^), if/3£ [( n +I)+ >n +i]. 

Corollary 4 |197| For any fixed (3 ^ 2 + , the difference (k — a(k,(3)) 
approaches zero at polynomial rate as k — >■ oo. For any fixed k ^ 2, the same 
difference approaches zero at exponential rate as (3 — )■ oo. 

Corollary 5 |197| For a fixed k, the jumps of the function a(k, [3) at the 
endpoints of the interval [n + ,n+l] are much bigger than the variation of this 
function inside this interval. Namely, 

a(k,n + ) - a(k,n) = + 0( 1 ^= T ), 
a(k,n+l) - a(k,n + ) = ^k=2 +0(^2^1). 

Next we analyze the behaviour of a(k,(3) at the point (3 = 2. 
15 Conjecture Q] was published before Dejean's conjecture was proved. 
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Proposition 10 |197| The following equalities hold: 

a(*+l,2) = + 
a(kX) = k-i-±-± + 0(±). 

Corollary 6 |197j For any k, the function a(k, (3) jumps by more than a 
unit at the point (3 = 2. Namely, a(k, 2 + ) — a(k, 2) = 1 + -h + 0(-h). 

Corollary 7 |197| At any point (k,2), the increment of k by 1 and the 
addition of + to the exponent almost equally affect the growth rate of the 
power-free language. Namely, a(k+l,2) — a(k,2 + ) = -p + C(p-). 

All asymptotic formulas given above work perfectly even for small alpha- 
bets, predicting the values from Table [T] with a good precision. For (3 < 2, 
our main results are the following two conjectures. They are based on a 
number of partial results and numerical bounds. 

Conjecture 2 [197] The following equalities hold for any fixed integers 
n, k such that k > n ^ 3: 

a {k,£r+) = k+2-n-*±+0($)> 
a(k,^) =A+l-n-*fi+0(£). 

Conjecture [2] predicts that the properties found above for the point [3 = 2 
hold true for all points (3 = -^-r such that 2 < n < k. Indeed, Conjecture [2] 
implies 

Corollary 8 |197| Let n and k be integers such that 2 < n < k. Then 

a(k,^+) -a(k^) =l+0(£r) 
a(k,^) -a(k,^ + ) = l + 0(±) 
«(*+!. ^i) - «(^ + ) = 0(±). 

The second conjecture describes the behaviour of the function a(k, f3) for 
the case when /3 depends on k such that the obtained language is close to a 
threshold language. 

Conjecture 3 [197J For any integer n ^ the limits 

a n = lim a(k, rz^zr + ) and a' n = lim a(k, J^Pr ) 

exist. Moreover, ct' n+1 = a n and a n+ \ — a n > 1. 
Note that a ~ 1-242 according to Conjecture [TJ 

In § 18, we demonstrate how our methods for power-free languages can 
be extended to estimate the growth rates of pattern-free and Abelian power- 
free languages. Recall that if u is a word, then a word w is said to avoid 
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the pattern u if there are no homomorphic images of u among the factors of 
w. An P-free language consists of all words (over a given alphabet) avoiding 
all patterns from the set P. Abelian powers generalize ordinary powers: 
two words are considered equal if they are anagrams of each other. Abelian 
power-free languages are defined in the same way as power-free languages. 

All pattern-free and Abelian power-free languages are symmetric. As a 
result, we can apply Algorithm U to such languages in order to get upper 
bounds for their growth rates. Only the procedure building the antidictionary 
should be appropriately changed. We develop such a universal procedure for 
Abelian power-free languages in a joint work with Samsonov |200j . On the 
other hand, such procedures for pattern-free languages heavily depend on the 
avoided patterns. So, we focused on two particular binary languages; they 
avoid two very similar sets Pi = {xxyxxy, xxx} and P 2 = {xyxxyx, xxx} 
respectively. For the language avoiding Pi we adopt Theorem [21] to find a 
sharp two-sided bound of the growth rate. This rate is about 1.0989. In 
contrast, we prove, using a modification of Algorithm U, that the second 
mentioned language is finite (and then has zero growth rate). These results 
are contained in [204J. 

Chapter 4 (§§ 19—20). Languages of minimal powers 

This chapter is closely connected to the previous one, because the study of 
factorial languages is impossible without paying attention to their antidic- 
tionaries. A word w from the antidictionary of a factorial language L has 
nearly the same structure as the words from L, because all proper factors of 
w belongs to L. On the other hand, the antidictionary of L has the structure 
completely different from that of L. Obviously, both similarity and difference 
affect complexity. 

Complexity of any language of minimal powers has some "trivial" zeroes 
(for example, a square cannot have odd length). In order to exclude trivial 
zeroes from consideration, we introduce a version of combinatorial complex- 
ity called root complexity. The root complexity Rk,/3{n) of the language of 
minimal fc-ary /3-powers returns the number of such powers of period n. If 
u 13 is a minimal /3-power, then the word u is /3-free. Hence, the complexity of 
a power-free language exceeds the root complexity of its antidictionary. But 
the numerical results show that the growth rates of these two complexities 
are very close for any power-free language. 

The root complexity of an antidictionary behaves much less regular than 
the complexity of the corresponding factorial language. That is why we 
study root complexity mostly within the bounds of Problem MP. Zeroes of 
the function R^p(n) are exactly the "forbidden" periods for minimal fc-ary 
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/3-powers. 



Problem 1.12 of [8J asks about zeroes and behaviour of the function 
Rs^n). We study this problem in §19. Minimal squares are closely con- 
nected to square-free circular words as the following proposition shows. Re- 
call that a circular word is just a cyclic sequence of letters; the factors of a 
circular word are usual words. Any word can be transformed to its circular 
closure by linking up the ends together. 

Proposition 11 |198jll99] A word of the form u 2 is a minimal square if 
and only if the circular closure of u is square-free. 

Due to Proposition (TTJ we formulate the main result of § 19 both in terms 
of circular words and in terms of root complexity. 

Theorem 24 |199j (1) A ternary square-free circular word of length n 
(la) exists if and only if n £ {5, 7, 9, 10, 14, 17}; 

(lb) is umquS if and only if n G {1,2,3,4,6,8,11,12,13,15,16,21}. 
(2) The number of ternary square-free circular words of length n depends on 
n exponentially. 

Corollary 9 The function R^fo) is exponential. Moreover, 



and i^3 )2 (n) ^ 12 otherwise. 

The statement (la) of Theorem [24] was first proved by Currie [54] with the 
aid of relatively long computer search. As a result, the proof by Currie cannot 
clarify the structure of ternary square-free circular words or the nature of the 
exceptions found. We give a computer-free proof, revealing an interesting 
connection between ternary square-free circular words and closed walks in 
the weighted graph. All statements of Theorem [241 are proved in parallel 
and all exceptions are made visual. 

The dissertation is concluded by § 20, in which zeroes of an arbitrary 
function i?fc !( a(n) are described. The results are formulated in terms of per- 
mitted/forbidden periods of minimal powers. Recall that words u and v are 
conjugates ifu — yz and v = zy for some y and z. Minimal powers over the 
binary alphabet are described in the following theorem. 

Theorem 25 [198J A binary minimal /3-power of period p 
(1) exists for any positive integer p if (3 ^ (5/2) + ; 

16 up to renaming the letters. 




0, if n 
3, if n 
6, if n 



5,7,9,10,14,17, 
1, 

2,3,4,6,8,11,12,13,15,16,21, 
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(2) exists for any positive integer p ^ {5,9,11,17,18} if (3 G [(7/3) + , 5/2]; 

(3) is a power of a conjugate of the word 8 m (a), 8 m (b), 8 m (aba), or 8 m (bab) 
for some m ^ 0, where 9 is the Thue-Morse morphism, if (3 G [2 + , (7/3)]; in 
particular, p = 2 m or p = 3 • 2 m . 

Proving the second statement of Theorem [251 we finalize the description 
of possible lengths of binary /3-free circular words (all other values of (3 were 
studied by Aberkane and Currie [H[2]). Note that the lists of exceptions 
in statement (2) of Theorem [25] and in the following corollary are slightly 
different. 

Corollary 10 [198] Let (3 G [(7/3) + , 5/2]. The binary /3-free circular word 
of length n exists if and only if n ^ {5, 9, 11, 18}. 

Next we move to the alphabets with more than two letters. 

Theorem 26 [198] Any positive integer is a period of some minimal k-ary 
[3-power if one of the following conditions holds: 
(1) k ^ 4 and = 2; (2) k ^ 3 and ^ 2+; (3) (3 > RT([A;/2J). 

On the other hand, forbidden periods exist when f3 ~ RT(fc). Some of 
them are listed in the following theorem. 

Theorem 27 [198] There exist no minimal k-ary (3-power of period p if 
one of the following conditions hold: 

(1) (3 G [^zr + ? §z§] and p satisfies one of the restrictions 

(a) k < p < (k—1) and p mod k 0, 

(b) p G [(m— 2)(fc+l)+l, m(k— 1)— 1] for some integer m ^ [^jr] ana " 
p mod k 7^ 0, 

(c) p = 3k or p = 4k; 

(2) /3 G [fr| + ,fz§] and p G [(m-l)(A;+l)+l, m(A;-2)-l] for some integer 
me[2,fc-2]; 

(3) fc ^ 9, /3 G [|Ef + , fcf ] , <m<* p = 2k-7. 

Corollary 11 [198] For (3 G [^ + , fE§] , and also for (3 G |e|] m 

i/ie case k ^ 7, £/ie minimal k-ary [3-powers of period p do not exist for Q(k 2 ) 
different values of p. 

Since the existence of a minimal fc-ary /3-power of period p is decidable 
for any fixed triple (k, (3,p), we can add the results of computer check to the 
above theorems. Finally, we get the following general conjecture about the 
existence and distribution of forbidden periods. 

Conjecture 4 |198] Letk^3,f3> RT(Jfe). 
(1) For a pair (k, (3), there exists a forbidden period if and only if one of the 
following conditions is satisfied: 
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(b) k = 6 and /9 G [f + , |], or jfe ^ 7 and fi G [|^ + , jgj] ; 

(c) k ^ 9 and (3 G [§ff§ + , fzf] ■ 

(2) For any pair (fc, iae set of forbidden periods is finite. 

(3) If k ^ 9, £/ien any period p ^ 1) zs permitted for any pair (k, j3). 

We conclude with two short comments on this conjecture. First, the 
intervals for /3 mentioned in statement (1) coincide with such intervals men- 
tioned in Theorem [271 Second, the bound on p in statement (3) is the best 
possible, because the period k(k— 1)— 1 is forbidden for any j3 ^ by 
Theorem [271(1). 
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